Goto

Collaborating Authors

 joint value


Improving Needle Penetration via Precise Rotational Insertion Using Iterative Learning Control

Foroutani, Yasamin, Mousavi-Motlagh, Yasamin, Barzelay, Aya, Tsao, Tsu-Chin

arXiv.org Artificial Intelligence

Abstract--Achieving precise control of robotic tool paths is often challenged by inherent system misalignments, unmodeled dynamics, and actuation inaccuracies. This work introduces an Iterative Learning Control (ILC) strategy to enable precise rotational insertion of a tool during robotic surgery, improving penetration efficacy and safety compared to straight insertion tested in subretinal injection. A 4 degree of freedom (DOF) robot manipulator is used, where misalignment of the fourth joint complicates the simple application of needle rotation, motivating an ILC approach that iteratively adjusts joint commands based on positional feedback. The process begins with calibrating the forward kinematics for the chosen surgical tool to achieve higher accuracy, followed by successive ILC iterations guided by Optical Coherence T omography (OCT) volume scans to measure the error and refine control inputs. Experimental results, tested on subretinal injection tasks on ex vivo pig eyes, show that the optimized trajectory resulted in higher success rates in tissue penetration and subretinal injection compared to straight insertion, demonstrating the effectiveness of ILC in overcoming misalignment challenges. This approach offers potential applications for other high precision robot tasks requiring controlled insertions as well. Accurate and precise control of movement is fundamental to many scientific fields [1], but it becomes even more critical in surgical applications where even minor deviations can significantly impact outcomes. Surgical procedures often demand sub-millimeter accuracy, especially in areas involving delicate tissues and confined spaces, such as ophthalmology. However, consistently achieving this level of precision can be challenging due to the inherent limitations of human motor skills, such as involuntary tremors and fatigue [2]. These limitations are amplified in intraocular microsurgery, requiring not only steady hands, but also enhanced sensory feedback and hand-eye coordination.


DIRIGENt: End-To-End Robotic Imitation of Human Demonstrations Based on a Diffusion Model

Spisak, Josua, Kerzel, Matthias, Wermter, Stefan

arXiv.org Artificial Intelligence

There has been substantial progress in humanoid robots, with new skills continuously being taught, ranging from navigation to manipulation. While these abilities may seem impressive, the teaching methods often remain inefficient. To enhance the process of teaching robots, we propose leveraging a mechanism effectively used by humans: teaching by demonstrating. In this paper, we introduce DIRIGENt (DIrect Robotic Imitation GENeration model), a novel end-to-end diffusion approach that directly generates joint values from observing human demonstrations, enabling a robot to imitate these actions without any existing mapping between it and humans. We create a dataset in which humans imitate a robot and then use this collected data to train a diffusion model that enables a robot to imitate humans. The following three aspects are the core of our contribution. First is our novel dataset with natural pairs between human and robot poses, allowing our approach to imitate humans accurately despite the gap between their anatomies. Second, the diffusion input to our model alleviates the challenge of redundant joint configurations, limiting the search space. And finally, our end-to-end architecture from perception to action leads to an improved learning capability. Through our experimental analysis, we show that combining these three aspects allows DIRIGENt to outperform existing state-of-the-art approaches in the field of generating joint values from RGB images.


Clarke Transform and Encoder-Decoder Architecture for Arbitrary Joints Locations in Displacement-Actuated Continuum Robots

Grassmann, Reinhard M., Burgner-Kahrs, Jessica

arXiv.org Artificial Intelligence

Abstract-- In this paper, we consider an arbitrary number of joints and their arbitrary joint locations along the center line of a displacement-actuated continuum robot. To achieve this, we revisit the derivation of the Clarke transform leading to a formulation capable of considering arbitrary joint locations. The proposed modified Clarke transform opens new opportunities in mechanical design and algorithmic approaches beyond the current limiting dependency on symmetric arranged joint locations. By presenting an encoder-decoder architecture based on the Clarke transform, joint values between different robot designs can be transformed enabling the use of an analogous robot design and direct knowledge transfer. To demonstrate its versatility, applications of control and trajectory generation in simulation are presented, which can be easily integrated into an existing framework designed, for instance, for three symmetric arranged joints. Joint location and improved joint representation.


D(R, O) Grasp: A Unified Representation of Robot and Object Interaction for Cross-Embodiment Dexterous Grasping

Wei, Zhenyu, Xu, Zhixuan, Guo, Jingxiang, Hou, Yiwen, Gao, Chongkai, Cai, Zhehao, Luo, Jiayu, Shao, Lin

arXiv.org Artificial Intelligence

Dexterous grasping is a fundamental yet challenging skill in robotic manipulation, requiring precise interaction between robotic hands and objects. In this paper, we present D(R,O) Grasp, a novel framework that models the interaction between the robotic hand in its grasping pose and the object, enabling broad generalization across various robot hands and object geometries. Our model takes the robot hand's description and object point cloud as inputs and efficiently predicts kinematically valid and stable grasps, demonstrating strong adaptability to diverse robot embodiments and object geometries. Extensive experiments conducted in both simulated and real-world environments validate the effectiveness of our approach, with significant improvements in success rate, grasp diversity, and inference speed across multiple robotic hands. Our method achieves an average success rate of 87.53% in simulation in less than one second, tested across three different dexterous robotic hands. In real-world experiments using the LeapHand, the method also demonstrates an average success rate of 89%. D(R,O) Grasp provides a robust solution for dexterous grasping in complex and varied environments. The code, appendix, and videos are available on our project website at https://nus-lins-lab.github.io/drograspweb/.


Diffusing in Someone Else's Shoes: Robotic Perspective Taking with Diffusion

Spisak, Josua, Kerzel, Matthias, Wermter, Stefan

arXiv.org Artificial Intelligence

Humanoid robots can benefit from their similarity to the human shape by learning from humans. When humans teach other humans how to perform actions, they often demonstrate the actions and the learning human can try to imitate the demonstration. Being able to mentally transfer from a demonstration seen from a third-person perspective to how it should look from a first-person perspective is fundamental for this ability in humans. As this is a challenging task, it is often simplified for robots by creating a demonstration in the first-person perspective. Creating these demonstrations requires more effort but allows for an easier imitation. We introduce a novel diffusion model aimed at enabling the robot to directly learn from the third-person demonstrations. Our model is capable of learning and generating the first-person perspective from the third-person perspective by translating the size and rotations of objects and the environment between two perspectives. This allows us to utilise the benefits of easy-to-produce third-person demonstrations and easy-to-imitate first-person demonstrations. The model can either represent the first-person perspective in an RGB image or calculate the joint values. Our approach significantly outperforms other image-to-image models in this task.


Multi-Objective Trajectory Planning with Dual-Encoder

Zhang, Beibei, Xiang, Tian, Mao, Chentao, Zheng, Yuhua, Li, Shuai, Niu, Haoyi, Xi, Xiangming, Bai, Wenyuan, Gao, Feng

arXiv.org Artificial Intelligence

Time-jerk optimal trajectory planning is crucial in advancing robotic arms' performance in dynamic tasks. Traditional methods rely on solving complex nonlinear programming problems, bringing significant delays in generating optimized trajectories. In this paper, we propose a two-stage approach to accelerate time-jerk optimal trajectory planning. Firstly, we introduce a dual-encoder based transformer model to establish a good preliminary trajectory. This trajectory is subsequently refined through sequential quadratic programming to improve its optimality and robustness. Our approach outperforms the state-of-the-art by up to 79.72\% in reducing trajectory planning time. Compared with existing methods, our method shrinks the optimality gap with the objective function value decreasing by up to 29.9\%.


The Multi-fingered Kinematic Model for Dual-arm Manipulation

Li, Jingyi

arXiv.org Artificial Intelligence

A planar kinematic model in the hand-object coordinates system for bimanual manipulation is presented. It can compute and determine the fingers configurations. In our experiment, the desired positions, as the model inputs are successfully generated valid joints values for bimanual manipulation. Abstract This paper presents the planar finger kinematic model for dual-arm robot to determine manipulation strategies. The first step is to model based on planar geometric features of the coordinated and rolling motion so that the robot can select the fingers configurations. For the hand-object model, we consider the distances between object and hands as the constraints. The second step is to seek the appropriate values of finger joints based on their positions samples which are randomly generated. Here the robot selects these positions according to the displacements of each joint and the k means clustering. The simulation shows that the selected solutions for the manipulation are all in the finger work space.


The Hand-object Kinematic Model for Bimanual Manipulation

Li, Jingyi

arXiv.org Artificial Intelligence

This paper addresses the planar finger kinematics for seeking optimized manipulation strategies. The first step is to model based on geometric features of linear and rotation motion so that the robot can select the fingers configurations. This kinematic model considers the motion between hands and object. Based on 2-finger manipulation cases, this model can output the strategies for bimanual manipulation. For executing strategies, the second step is to seek the appropriate values of finger joints according to the ending orientation of fingers. The simulation shows that the computed solutions can complete the relative rotation and linear motion of unknown objects.


High-Degrees-of-Freedom Dynamic Neural Fields for Robot Self-Modeling and Motion Planning

Schulze, Lennart, Lipson, Hod

arXiv.org Artificial Intelligence

A robot self-model is a task-agnostic representation of the robot's physical morphology that can be used for motion planning tasks in absence of classical geometric kinematic models. In particular, when the latter are hard to engineer or the robot's kinematics change unexpectedly, human-free self-modeling is a necessary feature of truly autonomous agents. In this work, we leverage neural fields to allow a robot to self-model its kinematics as a neural-implicit query model learned only from 2D images annotated with camera poses and configurations. This enables significantly greater applicability than existing approaches which have been dependent on depth images or geometry knowledge. To this end, alongside a curricular data sampling strategy, we propose a new encoder-based neural density field architecture for dynamic object-centric scenes conditioned on high numbers of degrees of freedom (DOFs). In a 7-DOF robot test setup, the learned self-model achieves a Chamfer-L2 distance of 2% of the robot's workspace dimension. We demonstrate the capabilities of this model on a motion planning task as an exemplary downstream application.


Learning Bidirectional Action-Language Translation with Limited Supervision and Incongruent Input

Özdemir, Ozan, Kerzel, Matthias, Weber, Cornelius, Lee, Jae Hee, Hafez, Muhammad Burhan, Bruns, Patrick, Wermter, Stefan

arXiv.org Artificial Intelligence

Human infant learning happens during exploration of the environment, by interaction with objects, and by listening to and repeating utterances casually, which is analogous to unsupervised learning. Only occasionally, a learning infant would receive a matching verbal description of an action it is committing, which is similar to supervised learning. Such a learning mechanism can be mimicked with deep learning. We model this weakly supervised learning paradigm using our Paired Gated Autoencoders (PGAE) model, which combines an action and a language autoencoder. After observing a performance drop when reducing the proportion of supervised training, we introduce the Paired Transformed Autoencoders (PTAE) model, using Transformer-based crossmodal attention. PTAE achieves significantly higher accuracy in language-to-action and action-to-language translations, particularly in realistic but difficult cases when only few supervised training samples are available. We also test whether the trained model behaves realistically with conflicting multimodal input. In accordance with the concept of incongruence in psychology, conflict deteriorates the model output. Conflicting action input has a more severe impact than conflicting language input, and more conflicting features lead to larger interference. PTAE can be trained on mostly unlabelled data where labeled data is scarce, and it behaves plausibly when tested with incongruent input.